Extraction of the Multi-word Lexical Units in the Perspective of the Wordnet Expansion
نویسندگان
چکیده
The paper focuses on selecting an optimal set of the Multiword Expressions Extraction methods used as a tool during wordnet expansion. Wordnet multiword lexical units are a broad class and it is difficult to find a single extraction method fulfilling the task. Many extraction association measures were tested on very large corpora and a very large wordnet, namely plWordNet. Several new measures are proposed and compared with selected methods in the literature. Two ways of combining measures into ensembles were analysed too. We showed that method selection and the tuning of their parameters can be transferred between two large corpora. The comparison of the extracted collocations with the huge set of plWordNet multiword lexical units revealed that the performance of the methods is much below the optimistic levels reported in the literature. However, the carefully selected set and combination of the methods can be a valuable tool for lexicographers.
منابع مشابه
On multiword lexical units and their role in maritime dictionaries
Multi-word lexical units are a typical feature of specialized dictionaries, in particular monolingual and bilingual maritime dictionaries. The paper studies the concept of the multi-word lexical unit and considers the similarities and differences of their selection and presentation in monolingual and bilingual maritime dictionaries. The work analyses such issues as the classification of multi-w...
متن کاملAutomatic Construction of Persian ICT WordNet using Princeton WordNet
WordNet is a large lexical database of English language, in which, nouns, verbs, adjectives, and adverbs are grouped into sets of cognitive synonyms (synsets). Each synset expresses a distinct concept. Synsets are interlinked by both semantic and lexical relations. WordNet is essentially used for word sense disambiguation, information retrieval, and text translation. In this paper, we propose s...
متن کاملA Corpus-based Study of Lexical Bundles in Discussion Section of Medical Research Articles
There has been increasing interest in utilizing corpora in linguistic research and pedagogy in recent years. Rhetorical organization of different sections of research articles may appear similar in various disciplines, but close examination may show subtle differences nonetheless. One of the features that has been at the center of attention especially in recent years is the idiomaticity of a di...
متن کاملA Procedural Definition of Multi-word Lexical Units
Multi-word expressions evade a closed definition. Linguists and computational linguists rely on intuition or build lists of MWE types; while practical, that is scientifically and aesthetically unsatisfying. Without presuming to solve a daunting theoretical problem, we propose a decision procedure which steers a lexicographer toward acceptance or rejection of an N-gram as a lexical unit: a decis...
متن کاملOn the Role of Derivational Processes in the Formation of Non-Taxonomic Classes of Lexical Units in Russian
The paper is focused on classes of lexical units which arise as a result of derivational processes – word formation and semantic transfers, acting either in isolation or together, on the basis of common semantic foundations that bind targets and sources of derivation. The lexical items which constitute the classes under study vary in their denotative characteristics and due to their categ...
متن کامل